new code
Diff-XYZ: A Benchmark for Evaluating Diff Understanding
Glukhov, Evgeniy, Conti, Michele, Bogomolov, Egor, Golubev, Yaroslav, Bezzubov, Alexander
Reliable handling of code diffs is central to agents that edit and refactor repositories at scale. We introduce Diff-XYZ, a compact benchmark for code-diff understanding with three supervised tasks: apply (old code $+$ diff $\rightarrow$ new code), anti-apply (new code $-$ diff $\rightarrow$ old code), and diff generation (new code $-$ old code $\rightarrow$ diff). Instances in the benchmark are triples $\langle \textit{old code}, \textit{new code}, \textit{diff} \rangle$ drawn from real commits in CommitPackFT, paired with automatic metrics and a clear evaluation protocol. We use the benchmark to do a focused empirical study of the unified diff format and run a cross-format comparison of different diff representations. Our findings reveal that different formats should be used depending on the use case and model size. For example, representing diffs in search-replace format performs best for larger models across most tasks, while structured udiff variants offer similar but slightly weaker performance. In contrast, smaller open models benefit little from any formatting choice. The Diff-XYZ benchmark is a reusable foundation for assessing and improving diff handling in LLMs that can aid future development of diff formats and models editing code. The dataset is published on HuggingFace Hub: https://huggingface.co/datasets/JetBrains-Research/diff-xyz.
Response to Reviewer 1
We thank the reviewers for their insightful comments. Please find the detailed responses below. TurboAE has a linear complexity in the block length, in runtime and computation. TurboAE and traditional turbo codes run on both GPU and CPU in the revision. We agree with the reviewer that the statement is confusing and misleading.
Microsoft CEO claims 30% of its new code is written by AI
Generative'AI' isn't just useful for making bad writing and bad images, it can be used to make software code, too. In fact, Microsoft's CEO claims that up to 30 percent of the company's new code is now created with artificial intelligence. Satya Nadella made this claim at LlamaCon (around the 45:00 minute mark), Meta/Facebook's conference focusing on generative AI tools. In fact Nadella was opposite Mark Zuckerberg, Facebook founder and controversy lightning rod, when he said as much yesterday. "Code reviews are very high," says Nadella. "In fact the agents we have for reviewing code, that usage has increased, and so I would say maybe 20, 30 percent of the code that is inside of our repos today and in some of our projects are probably all written by software." That's a pretty stunning claim, and as Tom's Hardware points out, it seems in line with similar claim from Google CEO Sundar Pichai made last year.
Google CEO says a quarter of the company's new code is already AI generated
Google CEO Sundar Pichai just revealed that AI now generates more than a quarter of new code for its products, according to a company earnings call transcribed by Ars Technica. In other words, AI tools are already having an absolutely mammoth impact on the development of software. Pichai did say that human programmers oversee the computer-generated code, which is something. The CEO noted that AI coding helps with "boosting productivity and efficiency," ensuring that engineers "do more and move faster." According to Stack Overflow's 2024 Developer Survey, over 75 percent of respondents are already using or are "planning to use" AI tools to assist with software development.
Thematic Analysis with Open-Source Generative AI and Machine Learning: A New Method for Inductive Qualitative Codebook Development
Katz, Andrew, Fleming, Gabriella Coloyan, Main, Joyce
This paper aims to answer one central question: to what extent can open-source generative text models be used in a workflow to approximate thematic analysis in social science research? To answer this question, we present the Generative AI-enabled Theme Organization and Structuring (GATOS) workflow, which uses open-source machine learning techniques, natural language processing tools, and generative text models to facilitate thematic analysis. To establish validity of the method, we present three case studies applying the GATOS workflow, leveraging these models and techniques to inductively create codebooks similar to traditional procedures using thematic analysis. Specifically, we investigate the extent to which a workflow comprising open-source models and tools can inductively produce codebooks that approach the known space of themes and sub-themes. To address the challenge of gleaning insights from these texts, we combine open-source generative text models, retrieval-augmented generation, and prompt engineering to identify codes and themes in large volumes of text, i.e., generate a qualitative codebook. The process mimics an inductive coding process that researchers might use in traditional thematic analysis by reading text one unit of analysis at a time, considering existing codes already in the codebook, and then deciding whether or not to generate a new code based on whether the extant codebook provides adequate thematic coverage. We demonstrate this workflow using three synthetic datasets from hypothetical organizational research settings: a study of teammate feedback in teamwork settings, a study of organizational cultures of ethical behavior, and a study of employee perspectives about returning to their offices after the pandemic. We show that the GATOS workflow is able to identify themes in the text that were used to generate the original synthetic datasets.
Using Generative Text Models to Create Qualitative Codebooks for Student Evaluations of Teaching
Katz, Andrew, Gerhardt, Mitchell, Soledad, Michelle
Feedback is a critical aspect of improvement. Unfortunately, when there is a lot of feedback from multiple sources, it can be difficult to distill the information into actionable insights. Consider student evaluations of teaching (SETs), which are important sources of feedback for educators. They can give instructors insights into what worked during a semester. A collection of SETs can also be useful to administrators as signals for courses or entire programs. However, on a large scale as in high-enrollment courses or administrative records over several years, the volume of SETs can render them difficult to analyze. In this paper, we discuss a novel method for analyzing SETs using natural language processing (NLP) and large language models (LLMs). We demonstrate the method by applying it to a corpus of 5,000 SETs from a large public university. We show that the method can be used to extract, embed, cluster, and summarize the SETs to identify the themes they express. More generally, this work illustrates how to use the combination of NLP techniques and LLMs to generate a codebook for SETs. We conclude by discussing the implications of this method for analyzing SETs and other types of student writing in teaching and research settings.
how-generative-ai-could-lead-to-a-10x-increase-in-coding-productivity
In the recent "Big Ideas 2023" report by Ark Invest, the investment management firm forecasted that AI could lead to a 10-fold increase in coding productivity. Based on a 70% annualized drop in trading costs and feedback loops, AI coding assistants like Copilot could increase the output for software engineers 10-fold by 2023. Generative AI has the potential to revolutionize the coding process and significantly increase productivity. By using deep learning algorithms, generative AI can learn from large datasets of code and generate new code that is syntactically and semantically correct. This can significantly reduce the time and effort required to write new code, especially for routine tasks that require repetitive coding patterns.
13 Best Code Review Tools for Developers (2023 Edition)
Code review is a part of the software development process which involves testing the source code to identify bugs at an early stage. A code review process is typically conducted before merging with the codebase. An effective code review prevents bugs and errors from getting into your project by improving code quality at an early stage of the software development process. In this post, we'll explain what code review is and explore popular code review tools that help organizations with the code review process. The primary goal of the code review process is to assess any new code for bugs, errors, and quality standards set by the organization. The code review process should not just consist of one-sided feedback. Therefore, an intangible benefit of the code review process is the collective team's improved coding skills. If you would like to initiate a code review process in your organization, you should first decide who would review the code. If you belong to a small team, you may assign team leads to review all code.
Google is testing a new robot that can program itself
Writing working code can be a challenge. Even relatively easy languages like HTML require the coder to understand the specific syntax and available tools. Writing code to control robots is even more involved and often has multiple steps: There's code to detect objects, code to trigger the actuators that move the robot's limbs, code to specify when the task is complete, and so on. Something as simple as programming a robot to pick up a yellow block instead of a red one is impossible if you don't know the coding language the robot runs on. But Google's robotics researchers are exploring a way to fix that.
Will AI Make Coding Obsolete?
Coding has become a critical skill for many jobs. Some countries and schools are even considering coding languages to be an acceptable form of a foreign language. In the midst of all this, the nature of code is changing dramatically. Low-code and no-code platforms are demonstrating aggressive growth and making it possible for individuals and organizations to create powerful production applications with relatively small amounts of what would traditionally be called coding. The next step in this trend is AI generating code, as recently demonstrated by Open AI Codex and Github Autopilot.